Goto

Collaborating Authors

 loss change allocation


LCA: Loss Change Allocation for Neural Network Training

Neural Information Processing Systems

Neural networks enjoy widespread use, but many aspects of their training, representation, and operation are poorly understood. In particular, our view into the training process is limited, with a single scalar loss being the most common viewport into this high-dimensional, dynamic process. We propose a new window into training called Loss Change Allocation (LCA), in which credit for changes to the network loss is conservatively partitioned to the parameters. This measurement is accomplished by decomposing the components of an approximate path integral along the training trajectory using a Runge-Kutta integrator. This rich view shows which parameters are responsible for decreasing or increasing the loss during training, or which parameters help or hurt the network's learning, respectively. LCA may be summed over training iterations and/or over neurons, channels, or layers for increasingly coarse views. This new measurement device produces several insights into training.



Reviews: LCA: Loss Change Allocation for Neural Network Training

Neural Information Processing Systems

Originality: While lots of works have studied the property of the endpoint found by SGDs, the literature looking at the SGD training dynamics in the context of deep neural networks is sparser, and the loss contribution metric appears novel to me. The paper is therefore original from that aspect. Quality: The paper is in general of good quality. However, few specific points could be improved: - It would be nice to characterize the approximation errors introduced by the first order taylor expension - Authors claim that the Loss contribution is grounded while other Fisher information-based metrics heavily depends on the parametrization chosen. Could the authors expend on this point and provided a more detailed comparison between LC and the metrics introduced in [1] and [13] - In the introduction, authors claim that entire layers drift on the wrong direction during training.


Reviews: LCA: Loss Change Allocation for Neural Network Training

Neural Information Processing Systems

There is some disagreement about this paper among reviewers. There is a common appreciation for this line of study and specifically the new loss contribution (LC) metric proposed. As many things about the training process of DNNs remains "mysterious", developing new and better "lenses" through which we can look at the inner workings of a DNN can be of great value for the field. The criticism in the less enthusiastic reviews is largely around "more effort": comparison to other approaches, more experiments, clarifications and improvements, making it more actionable. One can also give that a positive spin: there is a lot of interesting follow-up work to be done here.


LCA: Loss Change Allocation for Neural Network Training

Neural Information Processing Systems

Neural networks enjoy widespread use, but many aspects of their training, representation, and operation are poorly understood. In particular, our view into the training process is limited, with a single scalar loss being the most common viewport into this high-dimensional, dynamic process. We propose a new window into training called Loss Change Allocation (LCA), in which credit for changes to the network loss is conservatively partitioned to the parameters. This measurement is accomplished by decomposing the components of an approximate path integral along the training trajectory using a Runge-Kutta integrator. This rich view shows which parameters are responsible for decreasing or increasing the loss during training, or which parameters "help" or "hurt" the network's learning, respectively.


LCA: Loss Change Allocation for Neural Network Training

Lan, Janice, Liu, Rosanne, Zhou, Hattie, Yosinski, Jason

Neural Information Processing Systems

Neural networks enjoy widespread use, but many aspects of their training, representation, and operation are poorly understood. In particular, our view into the training process is limited, with a single scalar loss being the most common viewport into this high-dimensional, dynamic process. We propose a new window into training called Loss Change Allocation (LCA), in which credit for changes to the network loss is conservatively partitioned to the parameters. This measurement is accomplished by decomposing the components of an approximate path integral along the training trajectory using a Runge-Kutta integrator. This rich view shows which parameters are responsible for decreasing or increasing the loss during training, or which parameters "help" or "hurt" the network's learning, respectively.


Introducing LCA: Loss Change Allocation for Neural Network Training

#artificialintelligence

LCA components have the great property of being grounded, meaning that they sum to real changes in the loss (with some modifications of the approximation method to take curvature into account and guarantee accuracy, as explained fully in our paper). If we sum over parameters, we get the total change in loss at each iteration, and if we sum over iterations, we get the total LCA of each parameter.


LCA: Loss Change Allocation for Neural Network Training

Lan, Janice, Liu, Rosanne, Zhou, Hattie, Yosinski, Jason

arXiv.org Machine Learning

Neural networks enjoy widespread use, but many aspects of their training, representation, and operation are poorly understood. In particular, our view into the training process is limited, with a single scalar loss being the most common viewport into this high-dimensional, dynamic process. We propose a new window into training called Loss Change Allocation (LCA), in which credit for changes to the network loss is conservatively partitioned to the parameters. This measurement is accomplished by decomposing the components of an approximate path integral along the training trajectory using a Runge-Kutta integrator. This rich view shows which parameters are responsible for decreasing or increasing the loss during training, or which parameters "help" or "hurt" the network's learning, respectively. LCA may be summed over training iterations and/or over neurons, channels, or layers for increasingly coarse views. This new measurement device produces several insights into training. (1) We find that barely over 50% of parameters help during any given iteration. (2) Some entire layers hurt overall, moving on average against the training gradient, a phenomenon we hypothesize may be due to phase lag in an oscillatory training process. (3) Finally, increments in learning proceed in a synchronized manner across layers, often peaking on identical iterations.